Title
Acquisiton of Hand-Delineated Historic Watershed Boundaries from USGS Topographic Maps

Author Information
Bernie McNamara
American River College, Geography 350: Data Acquisition in GIS; Spring 2006
Contact Information: bjmcnam@usgs.gov

Abstract
Hand delineated historic watershed boundaries on published topographic paper maps have been used by USGS hydrologists to calculate drainage basin areas from the 1940's. The growing use of GIS within the agency has made the data acquisition and conversion of this pen-and-ink linework to a digital format imperative. Attempts at hand digitizing the delineated ink linework from the hundreds of source paper quadrangle maps using a traditional digitizing table have been unsuccessful, and often resulted in poor quality work. This project has been revisited with a new perspective: to try the ESRI ArcGIS ArcScan raster-to-vector software extension.

A pilot study area consisting of a block of six contiguous USGS 7.5 Minute series quadrangles in a geographic area of moderate topography in El Dorado county, California, was selected. The source paper quadrangles were captured as scanned TIFF files on a large format scanner. The files were then processed using a combination of interactive and batch processing methods with ESRI software products to extract the target hand drawn ink linework as raster data and convert the target cells to vector line datasets. A final basins vector dataset spanning the six source quadrangles was successfully created.


Introduction
Since the 1940's, hydrologists working in the USGS California District Water Resources division have delineated surface water drainage basins for a given gaging station on existing topographic maps. This is a very slow and labor intensive process, and requires expertise in the interpretation of contour lines and and how they interact with topographic and hydrographic features on the base map. The watershed boundary delineation began on published USGS 15 minute base map quadrangles (scale 1:62500), and from the 1950's on progressed to the larger scale (1:24000) USGS 7.5 minute series quadrangles. The original linework was done in pencil, and after a rigorous review and update process, the final linework was traced by a steady hand using rapidograph black ink. Once a basin boundary was completed, a drainage-area computation for the station could be estimated using a combination of mechanical and mathematical methods.

Over the past several decades, a historic dataset of basin boundaries and drainage-area computations has been incrementally developed. But it only exists on paper quadrangle maps. The need to convert this 'paper database' to a digital format has been recognized, but the effort has fallen short. Past attempts at hand digitizing the linework on a traditional digitizing table have not been successful. This is mostly due to the fact that this type of work is always delegated to student and part time help. With little consistency in the effort, coupled with the reliance on part time mimimally skilled GIS personnel and project funding lapses, the effort has produced minimal results. This approach has since been abandoned.

For this project, I proposed to research current GIS technology and experiment with strategies that could be used to digitally capture the hand-delineated basin linework for the purpose of creating a vector dataset. This dataset would then serve as a prototype for an archive digital dataset which could be used for furthur analyses, an example of which is to use GIS methods to obtain 'upstream' contributing drainage areas for a given gaging station.

Background
Drainage-area computation is an important responsibility of the USGS Data program. The group is made up of hydrologists and hydrologic technicians who are responsible for the planning, maintenance, quality assurance, and collection of streamflow and water quality data through the a network of gaging stations. They are also responsible for the analysis and publication of this data, which is used by many private consultants and government agencies for flood control forecasting, water availability studies, water quality studies, and for supporting hydrologic models.

The use of GIS technologies provide alternatives to pen-and-map analyses, as well as the creation of permanent spatial datasets. While raster to vector conversion software algorithms have been in use for the last three decades, they were primarily the domain of powerful mainframe computer systems. The users of this specialized and very expensive software were generally software contractors and large government agencies (as the US Department of Defense and NASA) with specific image processing needs, security considerations, and large budgets.

The release of the ArcScan for ArcGIS extension from the GIS software developer ESRI is one of several raster-to-vector conversion applications which has moved those processing capabilities to the PC desktop platform. The ArcScan product can be used to capture and convert source raster data to a supported GIS vector format.

The study area for this project consists of six contiguous 7.5 minute series USGS topographic maps in a geographic area of moderate topography located in El Dorado County, California. The selected quadrangles include Aukum, Camino, Fiddletown, Omo Ranch, Placerville, and Sly Park.

Study Unit location

Additional information about the USGS gaging station networks, including real-time data reporting, can be obtained at the NWISWeb ( National Water Information System) web site.

Methods
At the current release of ArcScan (version 9.1), vectorization of multi-value (colored) rasters is not supported. The input raster must be reclassified to a 'bi-level' format, where all cells belong to either a 'foreground' or 'background' classification. ArcScan only targets the foreground cells for vectorization. Hence it is important to become familiar with some of the raster processing tools available with the ESRI products. Much of the time spent on this proposal was on researching and developing those methods. I chose to use processing commands from both the ArcDesktop and ArcWorkstation modules. This flexible approach was based on the need to use the best tool suited for each step of the process.

The ArcWorkstation GRID module was selected for completing all the image processing steps required by the project. While the image processing could have been done in the ArcMap Spatial Analyst extension, the GRID module was favored for it's ease of automation. A simple AML script was developed to automate the repetitive steps of processing the source TIFF files to the required bi-level raster format.

Sample of the scanned USGS Sly Park 7.5 minute quadrangle
Figure 2: Sample of the Scanned USGS Sly Park 7.5 Minute Quadrangle.

After the six contiguous source quadrangles were selected and reviewed for linework quality, the following processing steps were completed:

a) The paper quads were scanned on a large format scanner, courtesy of the US Bureau of Reclamation. The paper maps have dimensions of 20.5 inches by 27 inches.

b) Each quad was converted to an ESRI GRID raster format.

c) Each quad was geoferenced to a digital index coverage of the USGS 7.5 minute quads. The ArcMap interactive Georeferencing tool was the method of choice for completing this important task. A rectified raster grid was created from this step.

d) The black value cells were extracted from the rectified raster.

e) The final bi-level raster grid was created from the black grid by replacing all NODATA values with a numeric value of 0.

The following Arc Workstation commands were used to process the source scanned tiff images into the bi-level grid raster format required by the ArcScan extension:

1) Convert the source scan tiff file to an ESRI GRID raster format with the IMAGEGRID command.
imagegrid input_tiff out_grid out_colormap_file

2) After interactively rectifying the out_grid, the the target black cell values were extracted using GRID functions. This process creates a new grid containing only cells representing black areas or nodata cells.
black_grid = con (slice (color2val (rectified_grid,out_color_file),eqinterval,2) eq2,1)

3) Convert the black_grid output from the previous step to a bi-level raster grid.
bilevel_grid = con (isnull (black_grid), 0, black_grid)

ArcScan Raster Cleanup

The extraction of black value cells from the rectified raster targets more than the intended ink linework. All black value cells are extracted, including map text, black symbols, and margin information. The non-targeted cells need to be removed prior to vectorization. This is another interactive step, but can be done quickly. Empirical testing on a few quads indicated that a collection of black cells with a total area of 1200 cells could be deleted in one process. This effectively reduces 'scatter'. The remaining quad was reviewed for those non-target black cell features that consisted of a total area greater that 1200 cells and manually deleted.

Settings used for raster cleanup
Figure 3: Settings used for raster cleanup

The above settings were used to clean up most of the noise and non-target cells prior to vectorization. A sample of the raster prior to cleanup is shown below.
Extracted black cells prior to raster cleanup
Figure 4: Extracted black cells prior to raster cleanup.

In addition to removing noise and other non-target cells, the raster extraction process creates holes in the linework that must be filled. If not corrected, this results in the formation of sliver polygons in the vectorized dataset.

Example of raster holes
Figure 5: Example of raster holes. These create sliver polygons.

Another problem with the raster extraction is with raster dangles. These are 'spurs' that are created, generally perpendicular to the trace route. There was not a quick method of removing the spurs during raster processing. I instead allowed these to be vectorized, and removed them using ArcEdit vector processing tools.

Example of raster dangles
Figure 6: Example of raster dangles.

ArcScan Vectorization Settings

The ArcScan extension provides for two modes of operation: raster tracing and batch vectorization. Raster tracing is a highly interactive environment where the operator is directing the vectorization. The scope of this project was on batch vectorization, which involves empirical testing of vectorization parameters to produce the end product. As it's name indicates, there is no user interaction with batch processing. The vectorization is completed on the entire bilevel raster and the results written to the designated shapefile. I tested many different parameter settings, and settled on the parameters listed in the following figure.

ArcScan Vectorization Settings
Figure 8: ArcScan vectorization settings.

The last identified problem with the raster extraction was with 'gaps'. There were several identified target line segments that for unexplained reasons contained a gap. If vectorized, these gaps would create unclosed polygons. I used the setting for Gap Closure Tolerance to resolve most of these issues. The few remaining gaps not resolved by this tolerance were easily identified and fixed with the ArcEdit vector processing tools.
Example of a raster gap
Figure 9: Example of a raster gap.

Results

The following table is a summary of information about the quadrangles used for the project. Note the significantly larger file size for the Fiddletown quadrangle. After reviewing the paper quads, this is the only map that contained the green forest shading commonly used on USGS quads in forested areas. Although this did not affect the extraction of targeted black cells, this does raise a question on if an 'urban' shaded source map (pinkish-red) would be affected.

Quadrangle USGS Index Scan Size (MB) RMS Error
Aukum 38120_E6 3.18 8.82206
Camino 38120_F6 6.36 6.64280
Fiddletown 38120_E7 11.09 6.49216
Omo Ranch 38120_E5 6.61 8.17913
Placerville 38120_F7 5.50 5.92782
Sly Park 38120_F5 8.31 1.89585



Final Basins vector dataset



Analysis

The pilot vector basins coverage that was created from the source raster scans was satisfactory. A remaining step that would have to be completed is basin attribution. That task was not addressed by this proposal.

Working through the image processing steps used to extract the target cells is where most of the time was spent. The remaining task of raster cleanup would be the other area where more research and testing could most likely lead to improved efficiencies.

In the intial project proposal, I had plans to compare the extracted basin results to an existing watershed boundary dataset, as the CALWATER221 or the USGS NHD hyrologic spatial datasets. But I later realized this was not a valid comparision. The extracted basins are based on river segments, from one USGS gaging station to the next downstream (or upstream) gaging station. The CALWATER221 and NHD are watersheds based on complete river systems. There would be no rationale for the comparison.

Conclusion
The ESRI ArcScan raster-to-vector conversion extension was used to successfully create a vector polygon watershed dataset from the raster sources. However, it was not straight forward and required a significant amount of raster image processing techniques to get to the end result. The effort used both ArcDesktop and ArcWorkstation commands, each chosen for their value to the project. Automation is the key to creating a production type process for capturing the remaining historic watershed boundaries.

In the methods described in this project, there are unfortunately too many 'interactive' processes that were done. If a future project were spawned from this effort, I would recommend that more research needs to be done in the areas that can be automated: the raster cleanup steps, and the vectorization settings used for batch processing. The georeferencing step remains the one process that cannot be automated.

Short of a full conversion of all paper maps to vector datasets, which could be a timely and expensive project, I am going to recommend that at a minimum, all paper source maps that contain historic delineated watershed boundaries be scanned. At least that way the original sources are captured digitally and permanently backed up. Over time, the current paper maps are going to degenerate, and at some point will be lost.

In the near future, alternative GIS methods for creating watershed boundaries will become more acceptable and will replace these manual methods as the underlying datasets used to generate these synthetic boundaries become more detailed and accurate. The Arc Hydro Tools is the way of the future.

References

Kopp,Steve et al.,2001. Using ArcGIS Spatial Analyst. ESRI Press, Redlands, CA.

Price,Curtis V. (raster specialist, US Geological Survey), 2006. Personal Communication.

Robinson,Arthur H. et al, 1995. Elements of Cartography, pp. 187-197:281-283, Wiley & Sons,Inc., New York

Sanchez,Paul, 2002. Using ArcScan for ArcGIS. ESRI Press, Redlands, CA.

Twenty-Fifth ESRI International User Confernence, 2005. ArcScan Technical Workshops, San Diego, CA.

US Geological Survey, 1952 (photorevised 1973). Aukum Quadrangle 7.5 Minute Series Topographic Map, scale 1:24000

US Geological Survey, 1952 (photorevised 1973). Camino Quadrangle 7.5 Minute Series Topographic Map, scale 1:24000

US Geological Survey, 1949. Fiddletown Quadrangle 7.5 Minute Series Topographic Map, scale 1:24000

US Geological Survey, 1952 (photorevised 1973). Omo Ranch 7.5 Minute Series Topographic Map, scale 1:24000

US Geological Survey, 1949 (photorevised 1973). Placerville Quadrangle 7.5 Minute Series Topographic Map, scale 1:24000

US Geological Survey, 1952 (photorevised 1973). Sly Park 7.5 Minute Series Topographic Map, scale 1:24000

US Geological Survey, 1983. California Index to topographic and other Map Coverage, National Mapping Program